Cost‐saving population genomic investigation of Daphnia longispina complex resting eggs using whole‐genome amplification and pre‐sequencing screening

Abstract Resting stages of aquatic organisms that accumulate in the sediment over time are an exceptional resource that allows direct insights into past populations and addressing evolutionary questions. This is of particular interest in taxa that face relatively new environmental challenges, e.g., climate change and eutrophication, such as the Daphnia longispina species complex, a keystone zooplankton group in European freshwater ecosystems. However, genomic analysis might be challenging as DNA yield from many of these resting stages can be low and the material degraded. To reliably allow the resequencing of single Daphnia resting eggs from different sediment layers and characterize genomic changes through time, we performed whole‐genome amplification to obtain DNA amounts suitable for genome resequencing and tested multiple protocols involving egg isolation, whole‐genome amplification kits, and library preparation. A pre‐sequencing contamination screening was developed, consisting of amplifying mitochondrial Daphnia and bacterial markers, to quickly assess and exclude possibly contaminated samples. In total, we successfully amplified and sequenced nine genomes from Daphnia resting eggs that could be identified as Daphnia longispina species. We analyzed the genome coverage and heterozygosity of these samples to optimize this method for future projects involving population genomic investigation of the resting egg bank.

growing genomic resources and high-throughput sequencing technologies, resting stage banks enabled researchers to investigate local adaptation in rotifers (Franch-Gras et al., 2018), genetic structure in diatoms (Aloisi et al., 2011), and genetic diversity over time in Daphnia magna (Orsini et al., 2016). In freshwater sediment, resting egg banks are often dominated by zooplanktonic crustaceans such as the genus Daphnia (Crustacea: Cladocera) that play a key role in aquatic food webs; they graze on phytoplankton and are a food source for secondary consumers (Lampert & Sommer, 2007).
As cyclical parthenogens, Daphnia are able to switch between asexual and sexual reproduction and the resulting resting eggs can withstand adverse conditions for decades and even centuries (Frisch et al., 2014). In some cases, resting eggs extracted from sediment cores can be hatched and clonal lines brought back to life to investigate temporal and spatial patterns in the recent past (Orsini et al., 2013). The DNA preserved in those resting eggs can also be directly analyzed with various molecular methods (Cousyn et al., 2001;Lack et al., 2018;Limburg & Weider, 2002) to study adaptation to changing environmental conditions such as temperature (Dziuba et al., 2020) or eutrophication (Alric et al., 2016;Cordellier et al., 2021).
Genomic investigation of the resting egg bank that can be conducted directly without hatching and establishing clonal lines is hindered by small amounts of potentially degraded DNA in single Daphnia resting eggs. Before high-throughput sequencing technologies became widespread in non-model organisms, population genetic studies were restricted to a few nuclear and mitochondrial markers to characterize resting eggs (Brede et al., 2009;Möst et al., 2015;Ortells et al., 2014). However, the small amount of tissue (~3500 cells per diapausing embryo in D. magna; Chen et al., 2018) makes whole-genome sequencing extremely challenging. One possible approach is pooling multiple eggs from a population for wholegenome sequencing but then information on individual genotypes is lost .
Another approach to obtain sufficient DNA from starting material that is of limited quantity and/or quality is multiple displacement amplification (MDA), a method for whole-genome amplification (WGA) commonly used to perform isothermal amplification of the template DNA. MDA-WGA uses phi29 DNA polymerase and annealing of random hexamers which does not require species-specific primers and yields an average DNA product length of >10 kb (Dean et al., 2002;Spits et al., 2006). It is the preferred method for SNP detection (de Bourcy et al., 2014) and has been used in other studies where extremely small specimens hinder genomic investigation to successfully perform RADSeq (Cruaud et al., 2018) and wholegenome sequencing (O'Grady et al., 2021). These new methods enabled researchers to study introgression in Schistosome parasites (Platt II et al., 2019) and population genomic structure in water mites (Blattner et al., 2022) and ghost-worms (Cerca et al., 2021). It can also be used to detect copy number variants (Deleye et al., 2017) and most structural variants (Lack et al., 2018). Potential drawbacks are increased cost for WGA kits and GC-dependent amplification bias (Sabina & Leamon, 2015).
In a previous study, Lack et al. (2018) demonstrated that it is possible to use WGA of Daphnia pulicaria resting eggs to achieve DNA concentration suitable for whole-genome sequencing but caution that it should only be used when necessary, e.g., when it is not possible hatch eggs and sequence genomes of multiple clonal individuals. However, the hatching success of resting eggs is highly speciesdependent, with Daphnia magna exhibiting a generally high hatching rate, and members of the Daphnia longispina species complex a poor hatching success. Further, within species, other factors such as lake of origin and sediment composition seem to play a role in hatching success (personal observation MC, Radzikowski et al., 2018). In addition, in populations of D. longispina species, hybrid resting eggs are less likely to hatch than their parental species in lab experiments (Schwenk et al., 2001) and natural populations (Keller et al., 2007;Keller & Spaak, 2004). Indeed, hybridization and introgression are common in the D. longispina species complex. This could result in bias towards the parental species when hatching and rearing clonal lines from resting eggs.
In this study, we tested multiple protocols involving egg isolation, WGA kits and library preparation, and developed a contamination screening to reliably sequence genomes from single resting eggs from the D. longispina species complex. We also analyzed the read depth and heterozygosity to optimize this method for future projects involving population genomic investigation of the resting egg bank using recent and historical resting eggs.

| Sampling and isolation
Sandy soil was collected by hand from the shoreline of the eutrophic lake Eichbaumsee, Germany (53° 29′ 6″ N, 10° 6′ 11″ E) and stored at 4°C. The exact age of the soil is unknown but the upper layers most likely contain recent Daphnia eggs from the last few years.
To collect Daphnia eggs small amounts of sediment were sieved (125 μm mesh size) and resuspended in ddH 2 O. Ephippia were eye spotted, counted and transferred to 1.5 ml tubes under a stereo microscope (Nikon SMZ800N). The water was then removed, and the samples were kept at 4°C in the dark until further processing. The ephippia were transferred to a drop of sterile 1x PBS and opened under a stereo microscope with insect needles and forceps previously treated with UV light in a PCR workstation Pro (VWR) and cleaned with DNA-ExitusPlus (PanReac AppliChem). If an egg was present a picture was taken and the quality was evaluated by eye based on their color into the categories light green or dark green which is the highest quality we find. Eggs that had an already damaged egg membrane, an uneven shape or were orange were discarded (Marková et al., 2006). The resting egg separated from the ephippial casing was then transferred to a tube with sterile 1x PBS with a pipette to wash away the remaining material and the egg was transferred in 1 μl PBS to a new tube with 2, 3 or 14 μl fresh PBS, depending on the WGA protocol (REPLI-g Mini, Single cell and Single Cell increased sample volume, respectively). The isolated eggs were kept at −20 or −80°C at least overnight until amplification.

| DNA extraction from batch cultures
As an unamplified control for the WGA samples, high-molecularweight genomic DNA was extracted from 20 pooled adult Daphnia individuals (M5 clone; Nickel et al., 2021) using a modified CTAB extraction method as described in Cristescu et al. (2006).

| Whole-genome amplification of isolated resting eggs
For whole-genome amplification of single eggs, the REPLI-g Single Cell Kit and REPLI-g Mini Kit (Qiagen) were used. Both kits are used for unbiased amplification of genomic loci due to MDA. The REPLI-g Single Cell Kit can be used for samples of 1-1000 intact cells and yields more DNA. The isolated resting eggs were thawed on ice and WGA was performed following the manufacturer's protocols.
Briefly, denaturation buffer was added to the prepared resting eggs in PBS and amplified by phi29 DNA polymerase under isothermal conditions at 30°C for 8 h using the REPLI-g Single Cell Kit and 16 h using the REPLI-g Mini Kit and the polymerase was inactivated at 65°C for 3 min. In addition, a modified protocol for the REPLI-g Single Cell Kit as described by Lack et al. (2018) was used; it is optimized for the amplification of 10-100 ng genomic DNA template and uses an increased sample volume (15 μl).
Eggs were either kept intact or punctured with an insect needle before the amplification to test whether manual crushing had an effect. The different methods involving the REPLI-g kit, if the normal or increased sample volume protocol were used, storage and egg integrity were performed on two eggs each for the nine tested protocol combinations and are shown in Table 1 and Figure 1.
The amplified product was quantified on a NanoDrop spectrophotometer (ThermoFisher) to check that the A260/280 and A260/230 values were both >1.8 which indicates DNA purity. The concentration was measured with a Qubit Fluorometer (ThermoFisher) because during the REPLI-g reaction single-stranded DNA is generated by random extension of primer dimers which leads to an overestimation of DNA using a spectrophotometer. Successful amplification product was purified with 0.4x Agencourt AMPure XP magnetic beads (Beckman Coulter) to remove small fragments and eluted in 60 μl 1x TE buffer. The cleaned genomic DNA was then quantified with a Qubit Fluorometer and fragment length was examined on a 4200 TapeStation (Agilent) or Fragment Analyzer (Agilent).
The amplification product was stored at −20°C until library preparation. The presence of Daphnia DNA in the WGA product was checked by amplifying fragments of the mitochondrial 16S rDNA gene using the universal cladoceran primers S1 (5′-CGG CCG CCT GTT TAT CAA AAA CAT-3′) and S2 (5′-GGA GCT CCG GTT TGA ACT CAG ATC-3′) with 1 cycle of 93°C for 2 min 30 s, 55°C for 1 min and 72°C for 2 min followed by 40 cycles of 93°C for 1 min, 55°C for 1 min and 72°C for 2 min and running a 1.5% agarose gel at 100 V to assess bands (Schwenk et al., 1998). To check for a low presence of bacterial DNA universal primers for the bacterial 16S rDNA gene were used (5′-TCC TAC GGG AGG CAG CAG T-3′ and 5′-GGA CTA CCA GGG TAT CTA ATC CTG TT-3′) with 1 cycle of 50°C for 2 min and 95°C for 10 min followed by 40 cycles of 95°C for 15 s and 60°C for 1 min (Nadkarni et al., 2002).

| Library preparation and sequencing
Paired-end library construction was conducted for the 18 WGA samples and one unamplified CTAB sample with the Nextera XT DNA Library Preparation Kit (Illumina). Two library preparation kits were used on the same WGA samples to test which library kit could be best adapted to these samples. Five WGA samples (500 ng as input DNA) were fragmented using the M220 Focused-ultrasonicator (Covaris) and prepared with the NEBNext® Ultra™ II DNA Library Prep Kit for Illumina® (New England Biolabs). In addition, five WGA samples (500 ng as input DNA) and one unamplified CTAB sample were prepared with NEBNext® Ultra™ II FS DNA Library Prep Kit for Illumina® which includes an enzyme DNA fragmentation step.
The obtained fragment length was measured prior to sequencing on a 4200 TapeStation (Agilent) with the High Sensitivity D5000 kit. Using only the eight WGA samples that were identified as largely Daphnia sequences in the previous low-coverage MiSeq sequencing step as well as one unamplified CTAB sample (9 total), new libraries were prepared with the NEBNext® Ultra™ II FS DNA Library Prep Kit for Illumina®. Then, 150 bp paired-end sequencing was generated on the Illumina NovaSeq 6000 platform as part of a previous study (Nickel et al., 2021). This whole-genome data was used here to have sufficient coverage to assess genome coverage and is available from the European Nucleotide Archive (accession numbers: ERR4610186-ERR4610192, ERR4610229, and ERR5235052). The successful sample EIC_13.2 could not be sequenced again because no amplification product was left.

| Sequencing analysis and genotyping
The quality of raw and trimmed reads was assessed using FastQC v0.11.7 (Andrews, 2010). Trimming and quality filtering of the 30 total MiSeq datasets was performed using Trimmomatic v0.38 (Bolger et al., 2014) with the following parameters: TRAILING: 15 SLIDINGWINDOW: 4:15 MINLEN: 120. To assess contamination in the WGA samples FastQ Screen v0.14.0 with the BWA mapping option was used (Wingett & Andrews, 2018). A custom database was built to map trimmed reads against possible contaminants TA B L E 1 Summary of all samples for egg quality, the complete method used for each egg, and PCR screening results (16S Daphnia/16S bacteria: Amplified product visible on a gel).

| Whole-genome amplification
The amplification step of genomes derived from resting eggs yielded 3.7-10.4 μg and 9.4-41 μg DNA per reaction for the REPLI-g Mini and REPLI-g Single Cell Kit, respectively, and generated very long fragments F I G U R E 1 Experimental design workflow for multiple protocols of whole-genome amplification of isolated resting eggs, library preparation, sequencing, and bioinformatic analysis.

| Contamination and read mapping of MiSeq datasets
The trimmed reads were analyzed with FastQ Screen to assess possible contamination. We expected that for samples with successful To properly analyze reads that mapped to multiple genomes with FastQ Screen and verify these results, the sequences were mapped separately to the D. galeata genome. We clearly identified presented for both protocols in Figure 2. In total, nine Daphnia genomes from resting eggs were successfully amplified and sequenced which results in a 50% success rate as well as one Daphnia genome from a pooled unamplified DNA sample.

| Effect of different protocols used
Out of the nine failed samples, only two were amplified using the REPLI-g Mini Kit while seven were amplified using the REPLI-g Single Cell Kit (normal and increased sample protocol). The latter yielded more DNA but produced lower quality WGA product and we were only able to successfully amplify samples using two protocol Three eggs could be successfully amplified after keeping them intact before amplification and six that were manually crushed with an insect needle. However, because of their fragility, the membrane of some of the intact eggs was most likely also punctured during isolation and transfer.
Three eggs out of 10 tested eggs were successfully amplified using the REPLI-g Single Cell Kit, one using −20°C storage and leaving the eggs intact and two using −80°C storage and crushing the eggs. Six out of 8 tested eggs could be amplified using the REPLI-g Mini Kit, two each using −80°C storage and either leaving the eggs intact or crushing them and two using −20°C storage and leaving the eggs intact.
The different library kits used generated a similar number of reads per library and a higher proportion of reads were retained after trimming using the NEB kits. In addition, the NEB kit yielded more consistent library concentrations and no failed libraries due to the protocol having more options to customize for different DNA input concentrations.

| PCR contamination screening
Two different 16S PCR markers were used to assess the quality of the WGA product before sequencing and to compare these results to the results achieved by sequencing and mapping the reads. Sanger sequencing and BLAST search of the Daphnia 16S PCR fragments confirmed that all were of D. galeata or D. longispina mitochondrial origin.
It is to be noted that Sanger sequencing was only conducted here for further diagnosis and is not necessary for the contamination screening.

| Read depth and coverage of NovaSeq datasets
While the MiSeq-generated data worked well to identify contamination quickly and at a lower cost, the low genome coverage (average 0.1x) was not sufficient to assess differences between unamplified and amplified samples and different WGA protocols used. To better compare the mapping rate between samples, assess the read coverage and variant calling, we thus used the sequences of eight successful amplification samples and the unamplified, pooled sample with a higher coverage (0.60-57.05x, Table A2) obtained in another study (Nickel et al., 2021).
As MDA-WGA can lead to non-uniform amplification of the genome (Pinard et al., 2006)  These were discarded during trimming and it subsequently resulted in lower coverage than the other samples (0.6x).
The normalized read depth for the unamplified sample M5 shows uniform coverage across the genome with few regions having low or very high coverage (Figure 3). In general, two samples that were prepared with the same protocol show very similar patterns of read coverage. The three samples that were prepared with the REPLI-g Single Cell Kit show a uniform coverage similar to the unamplified M5 sample, with most regions having a coverage above 5 and few regions with coverage above 50 that could point to overamplification of specific regions. For the samples that were amplified with the REPLI-g Mini Kit, most regions have coverage above 1 but some regions show much lower or higher coverage, especially sample EIC_14.

| Variant calling and heterozygosity
The invariant data set included 130,950,194 sites across the 9 samples. To assess whether we find downwardly biased estimates of heterozygosity in the amplified samples we compared the proportion of heterozygous genotype calls in sliding windows across the genome to the unamplified sample M5 (Figure 4). The heterozygosity across the genome in the unamplified sample was even with few outlier windows and the genome-wide average heterozygosity was 0.00934. We find very similar genome-wide patterns for the eight amplified samples and no general trend of loss of heterozygosity, with three samples having higher average heterozygosity compared with M5 and five having lower heterozygosity (Table A3). However, as the resting eggs were sampled from a natural population some differences in heterozygosity between two samples using the same protocol are expected.

| DISCUSS ION
Egg banks of zooplankton allow researchers to track long-term genetic and ecological variation within an ecosystem and provide insight into past populations by hatching long-dormant eggs or using genetic markers (Alric et al., 2016;Burge et al., 2018;Frisch et al., 2014). However, these studies are still limited by the reduction in egg viability with sediment age and the low amount of highquality DNA in the eggs. The goal of this study was to optimize the whole-genome sequencing of D. longispina species resting eggs and establish a more reliable WGA method using 18 single resting eggs isolated from sediment. Hatching eggs from the resting egg bank and establishing clonal lines in the lab can be unpredictable and success rates depend on the species, the water bodies where sediment was collected, and the hatching conditions used (Radzikowski et al., 2018). In addition, D. longispina species hybrid resting eggs show lower hatching success and survival rates than their parental species (Schwenk et al., 2001). This introduces bias where hybrids appear less frequent than they are when working on admixed populations from the resting egg bank. We suggest that our method could help get more accurate genomic data from hybrid populations in other studies (Nickel et al., 2021).
In our study, we included the WGA method presented in Lack et al. (2018) using Daphnia pulicaria resting eggs where they were able to sequence one out of three resting eggs. We did not achieve successful amplification for the two tested D. galeata eggs using the same protocol (EIC_2.1 and EIC_2.2). In addition, Lack et al. (2018) did not use any qualitative diagnostic to assess the amplified Daphnia  and may be very low in specific lakes because of the poor condition of resting eggs (Marková et al., 2006). The contamination screening helps to identify potentially successful amplification of Daphnia DNA before the library preparation and sequencing and generate genomic data under these difficult conditions. The bacterial markers can be used to identify bacterial contamination in WGA of all other species, while the Daphnia markers work for all Cladocera species, many of which also produce resting stages (Vandekerkhove et al., 2005 Some challenges with whole-genome amplification were contamination which most likely stemmed from problems during the amplification step. Lab protocols to minimize contamination were used but the high sensitivity of the WGA kits could lead to the amplification of DNA from the wrong cells present in the sample. A critical step is to use high-quality undamaged eggs that show no signs of degradation. In eggs from older sediment layers, this is often more difficult and will be tested in a different study. The benefits of this method included the potential to go back decades to centuries because DNA is preserved longer than eggs that can be reliably hatched (Limburg & Weider, 2002). However, some amplifications of resting eggs failed but we were not able to identify one or multiple specific organisms as major contaminants and most reads could not be mapped. Instead, we hypothesize that these unknown sequences could be caused by the phi29 polymerase performing non-templated DNA synthesis which is a known phenomenon in MDA-WGA and produces "junk" DNA possibly when the egg is already degrading or the amount of DNA present is too small (Nelson, 2014). This seems to be a more frequent problem using the REPLI-g Single Cell Kit where only three out of ten samples were successfully amplified.
In conclusion, the most appropriate complete protocol we tested included using the REPLI-g Mini Kit, storing eggs at −80°C, leaving the eggs intact, and using the NEB library kit. After the resting egg is removed from the ephippia, it is extremely fragile, and immediately freezing it at −80°C seems to be a crucial step to slow DNA degradation. Nevertheless, the high biological variability of the resting eggs and the relatively small number of eggs tested for each protocol makes it difficult to draw more general conclusions.
The possible shortcomings of WGA methods include the amplification of contaminant DNA instead or in addition to the template DNA (Thoendel et al., 2017), and increased costs for sample preparation. Currently, the cost for the suggested REPLI-g Mini Kit is ~$8 per sample. The amplification of contaminant DNA remains an issue; however, our contamination screening when applied at a larger scale would lead to substantial cost savings, by markedly reducing the number of contaminated samples being processed further and sequenced. While sequencing itself has become extremely low-cost, library preparation remains costly. Prices for reagents, kits, labor, and sequencing services vary considerably between countries and are further influenced by the scale of purchasing. We, therefore, refrain from providing exact cost calculations.
When using WGA strategies, it is also important to consider the impact of the quantity of input DNA that can lead to downwardly biased estimates of heterozygosity and therefore genotyping bias (Medeiros & de Medeiros & Farrell, 2018). This study and others that use very small amounts of input DNA (Campbell et al., 2020;Cruaud et al., 2018;O'Grady et al., 2021) indicate that MDA-WGA does not introduce amplification bias that affects SNP genotyping.
It is also suitable for structural variant calling with the exception of inversions (Lack et al., 2018).
To sum it up, our method will allow the resequencing of resting eggs from different sediment layers to characterize genomic changes through time in the D. longispina species complex. In a broader context, WGA could also be used for resting stages of other organisms with low amounts of DNA such as other Cladocera taxa, rotifers, or diatoms to gain a more complete understanding of freshwater ecosystems.

ACK N OWLED G M ENTS
We would like to thank Fynn Eilers and Jennifer Drechsler for their support in the molecular laboratory and with sampling. We are grateful to the Fischer group at the Institute of Food Chemistry (University of Hamburg) for providing us access to the Illumina MiSeq instrument. Open Access funding enabled and organized by Projekt DEAL.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict of interest.

TA B L E A 1
Accession number for genomes included in the custom database to assess contamination with FastQ Screen.